Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families.

Identifieur interne : 000183 ( Main/Exploration ); précédent : 000182; suivant : 000184

Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families.

Auteurs : Zongyu Wang [République populaire de Chine] ; Wenying He [République populaire de Chine] ; Jijun Tang [République populaire de Chine] ; Fei Guo [République populaire de Chine]

Source :

RBID : pubmed:31944107

Abstract

Transcription factors (TFs) play a crucial role in controlling key cellular processes and responding to the environment. Yeast is a single-cell fungal organism that is a vital biological model organism for studying transcription and translation in basic biology. The transcriptional control process of yeast cells has been extensively calculated and studied using traditional methods and high-throughput technologies. However, the identities of transcription factors that regulate major functional categories of genes remain unknown. Due to the avalanche of biological data in the post-genomic era, it is an urgent need to develop automated computational methods to enable accurate identification of efficient transcription factor binding sites from the large number of candidates. In this paper, we analyzed high-resolution DNA-binding profiles and motifs for TFs, covering all possible contiguous 8-mers. First, we divided all 8-mer motifs into 16 various categories and selected all sorts of samples from each category by setting the threshold of E-score. Then, we employed five feature representation methods. Also, we adopted a total of four feature selection methods to filter out useless features. Finally, we used Extreme Gradient Boosting (XGBoost) as our base classifier and then utilized the one-vs-rest tactics to build 16 binary classifiers to solve this multiclassification problem. In the experiment, our method achieved the best performance with an overall accuracy of 79.72% and Mathew's correlation coefficient of 0.77. We found the similarity relationship among each category from different TF families and obtained sequence motif schematic diagrams via multiple sequence alignment. The complexity of DNA recognition may act as an important role in the evolution of gene regulation. Source codes are available at https://github.com/guofei-tju/tfbs.

DOI: 10.1021/acs.jcim.9b01012
PubMed: 31944107


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families.</title>
<author>
<name sortKey="Wang, Zongyu" sort="Wang, Zongyu" uniqKey="Wang Z" first="Zongyu" last="Wang">Zongyu Wang</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
<placeName>
<settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="He, Wenying" sort="He, Wenying" uniqKey="He W" first="Wenying" last="He">Wenying He</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
<placeName>
<settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Tang, Jijun" sort="Tang, Jijun" uniqKey="Tang J" first="Jijun" last="Tang">Jijun Tang</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
<placeName>
<settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Guo, Fei" sort="Guo, Fei" uniqKey="Guo F" first="Fei" last="Guo">Fei Guo</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
<placeName>
<settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2020">2020</date>
<idno type="RBID">pubmed:31944107</idno>
<idno type="pmid">31944107</idno>
<idno type="doi">10.1021/acs.jcim.9b01012</idno>
<idno type="wicri:Area/PubMed/Corpus">000287</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000287</idno>
<idno type="wicri:Area/PubMed/Curation">000287</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000287</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000182</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000182</idno>
<idno type="wicri:Area/Ncbi/Merge">002481</idno>
<idno type="wicri:Area/Ncbi/Curation">002481</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">002481</idno>
<idno type="wicri:Area/Main/Merge">000186</idno>
<idno type="wicri:Area/Main/Curation">000183</idno>
<idno type="wicri:Area/Main/Exploration">000183</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families.</title>
<author>
<name sortKey="Wang, Zongyu" sort="Wang, Zongyu" uniqKey="Wang Z" first="Zongyu" last="Wang">Zongyu Wang</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
<placeName>
<settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="He, Wenying" sort="He, Wenying" uniqKey="He W" first="Wenying" last="He">Wenying He</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
<placeName>
<settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Tang, Jijun" sort="Tang, Jijun" uniqKey="Tang J" first="Jijun" last="Tang">Jijun Tang</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
<placeName>
<settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Guo, Fei" sort="Guo, Fei" uniqKey="Guo F" first="Fei" last="Guo">Fei Guo</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350</wicri:regionArea>
<placeName>
<settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Journal of chemical information and modeling</title>
<idno type="eISSN">1549-960X</idno>
<imprint>
<date when="2020" type="published">2020</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Transcription factors (TFs) play a crucial role in controlling key cellular processes and responding to the environment. Yeast is a single-cell fungal organism that is a vital biological model organism for studying transcription and translation in basic biology. The transcriptional control process of yeast cells has been extensively calculated and studied using traditional methods and high-throughput technologies. However, the identities of transcription factors that regulate major functional categories of genes remain unknown. Due to the avalanche of biological data in the post-genomic era, it is an urgent need to develop automated computational methods to enable accurate identification of efficient transcription factor binding sites from the large number of candidates. In this paper, we analyzed high-resolution DNA-binding profiles and motifs for TFs, covering all possible contiguous 8-mers. First, we divided all 8-mer motifs into 16 various categories and selected all sorts of samples from each category by setting the threshold of E-score. Then, we employed five feature representation methods. Also, we adopted a total of four feature selection methods to filter out useless features. Finally, we used Extreme Gradient Boosting (XGBoost) as our base classifier and then utilized the one-vs-rest tactics to build 16 binary classifiers to solve this multiclassification problem. In the experiment, our method achieved the best performance with an overall accuracy of 79.72% and Mathew's correlation coefficient of 0.77. We found the similarity relationship among each category from different TF families and obtained sequence motif schematic diagrams via multiple sequence alignment. The complexity of DNA recognition may act as an important role in the evolution of gene regulation. Source codes are available at https://github.com/guofei-tju/tfbs.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>République populaire de Chine</li>
</country>
<settlement>
<li>Tianjin</li>
</settlement>
</list>
<tree>
<country name="République populaire de Chine">
<noRegion>
<name sortKey="Wang, Zongyu" sort="Wang, Zongyu" uniqKey="Wang Z" first="Zongyu" last="Wang">Zongyu Wang</name>
</noRegion>
<name sortKey="Guo, Fei" sort="Guo, Fei" uniqKey="Guo F" first="Fei" last="Guo">Fei Guo</name>
<name sortKey="He, Wenying" sort="He, Wenying" uniqKey="He W" first="Wenying" last="He">Wenying He</name>
<name sortKey="Tang, Jijun" sort="Tang, Jijun" uniqKey="Tang J" first="Jijun" last="Tang">Jijun Tang</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000183 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000183 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     pubmed:31944107
   |texte=   Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:31944107" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021